Automatically Discovering Semantic Links among Documents and Applications
نویسندگان
چکیده
ABSTRACT Automatically discovering semantic links among documents is the basis of developing advanced applications on large-scale documentary resources. This paper proposes an approach to automatically discover semantic links in a given document set. It has the following advantages: (1) It does not rely on any predefined ontology. (2) The semantic link networks and relevant rules automatically evolve. (3) It can adapt to the update of the adopted techniques. Experiments on document sets of different types (scientific papers and Web pages on Dunhuang culture) and different scales show the proposed approach feasible. The approach can be used to automatically construct semantic overlays on large document sets to support advanced applications like various relation queries on documents.
منابع مشابه
Automatically constructing semantic link network on documents
Knowing semantic links among resources is the basis of realizing machine intelligence over large-scale resources. Discovering semantic links among resources with limited human interference is a challenge issue. This paper proposes an approach to automatically discovering and predicting semantic links in a document set based on a model of document semantic link network (SLN). The approach has th...
متن کاملDiscovering Latent Graphs with Positive and Negative Links to Eliminate Spam in Adversarial Information Retrieval
This paper proposes a new direction in Adversarial Information Retrieval through automatically ranking links. We use techniques based on Latent Semantic Analysis to define a novel algorithm to eliminate spam sites. Our model automatically creates, suppresses, and reinforces links. Using an appropriately weighted graph spam links are assigned substantially lower weights while links to normal sit...
متن کاملAutomatic Discovery of Semantic Structures in HTML Documents
Template-driven HTML documents posses an implicit, fixed schema denoting concepts and their relationships in a hierarchical fashion. Discovering this schema remains a relatively unexplored problem. By exploiting a key observation that semantically related items in HTML documents exhibit spatial locality, we develop an algorithm for automatically partitioning them into tree-like semantic structu...
متن کاملInformation Access via Topic Hierarchies and Thematic Annotations from Document Collections
With the development and the availability of large textual corpora, there is a need for enriching and organizing these corpora so as to make easier the research and navigation among the documents. The Semantic Web research focuses on augmenting ordinary Web pages with semantics. Indeed, wealth of information exists today in electronic form, they cannot be easily processed by computers due to la...
متن کاملDiscovering Missing Semantic Relations between Entities in Wikipedia
Wikipedia’s infoboxes contain rich structured information of various entities, which have been explored by the DBpedia project to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedia’s instances. However, quite a few hyperl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008